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Abstract — This work discusses the problem of sparse signal 
recovery when there is correlation among the values of non- 
zero entries. We examine intra-vector correlation in the context 
of the block sparse model and inter-vector correlation in the 
context of the multiple measurement vector model, as well as their 
combination. Algorithms based on the sparse Bayesian learning 
are presented and the benefits of incorporating correlation at 
the algorithm level are discussed. The impact of correlation on 
the limits of support recovery is also discussed highlighting the 
different impact intra-vector and inter-vector correlations have 
on such limits. 

I. Introduction 

The problem of sparse signal recovery has many potential 
applications [1], [2] and has received much attention in recent 
years with the development of compressed sensing (CS) [3], 
[4]. The general Multiple Measurement Vector (MMV) model 
is given by [5] 

Y = *X + V. (1) 

Here Y = [Y.i,-- - ,Y. L ] e M. NxL is an available mea- 
surement matrix consisting of L measurement vectors. <l> 6 
M. x (N <^C M) is a known matrix, and any N columns of 
* are linearly independent. X = [X.i, • • • , X. L ] <G R MxL is 
an unknown and full column-rank matrix of interest. A key 
assumption here is that X has only a few non-zero rows. V is 
a noise matrix. The special case of L = 1 is the widely studied 
Single Measurement Vector (SMV) problem in CS and in this 
context we use x to denote the vector of interest. 

II. Structure in X 

In the basic SMV and MMV models no additional as- 
sumptions are usually made. However, in many applications 
additional structure on X is available and we now discuss a 
few of them. 

(1) For the SMV problem, in contrast to the usual assump- 
tions that the locations of non-zero entries are independently 
and uniformly distributed, some dependency in the locations 
is assumed [6]-[8]. Incorporating this structure is important 
from an application point of view and this structure can be 
exploited to improve the performance of algorithms. 

(2) In the SMV problem a widely studied structure is 
block/group structure [9], [10]. With this structure, x can be 



viewed as a concatenation of g blocks, i.e. 

x = [x lr ■ ■ ■ ,x dl ,- ■ ■ ,x dg _ 1+1 , ■ ■ ■ ,x dg ] T (2) 
v v ' ^— w ' 
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where di(\/i) are not necessarily the same. Among the g 
blocks, only k blocks are nonzero, where k <C g. This can 
be viewed as a special case of modeling the distribution of 
the locations of the non-zero entries, but is worthy of special 
attention because of its application potential. In general, no ad- 
ditional assumption is made about the entries in each nonzero 
block. Motivated by applications, it appears reasonable to 
assume that the entries in each non-zero block are correlated 
[11], [12]. We refer to this as intra-block correlation and will 
discuss it in detail in Section UlI-AI 

(3) In the basic MMV problem, the typical assumption made 
is that the vectors in X share a common sparsity profile. 
This leads to non-zero rows in X. One can impose additional 
structure. One possibility could be dependency in the locations 
of the non-zero rows. And the other is correlation between the 
entries in each of the non-zero rows [13], [14]. We refer to 
the correlation as inter- vector correlation and will discuss it in 
Section llH-Bl 

(4) One can combine the above-mentioned two types of 
structure and consider the problem of block sparsity in the 
MMV problem. This leads to the consideration of correlated 
non-zero blocks of rows in X. The challenge in this context is 
efficiently modeling and estimating the correlation structure. 

(5) The time- varying sparsity model is a natural extension 
of the MMV model [15]— [17]. It considers the case when the 
support of each column of X is time-varying. The time-varying 
structure calls for modeling both the variation in the number 
and locations of the non-zero entries as well as the correlation 
of the non-zero entries. 

III. Intra-Vector and Inter- Vector Correlation 

A. Intra-Vector Correlation 

For the SMV problem with the block structure (|2), a number 
of algorithms have been proposed, such as the Group Lasso 
[9]. But few consider correlation within each block Xj(Vi), 
namely the intra-block correlation. 



To exploit the intra-block correlation, we have proposed the 
the block sparse Bayesian learning (bSBL) framework [1 1], an 
extension of the basic SBL framework [18]. We review it in 
the following. 

In this framework, each block x, G R diXl is assumed to 
satisfy a parameterized multivariate Gaussian distribution: 

p(xi) ~Af(0, 7l B 4 ), Vi (3) 

Here 7, is a nonnegative parameter controlling the block- 
sparsity of x. When 7, = 0, the i-th block becomes zero. Dur- 
ing the learning procedure most 74 tend to be zero, due to the 
mechanism of automatic relevance determination [18]. Thus, 
sparsity in the block level is encouraged. B, G K di xdi is a pos- 
itive definite matrix, capturing the correlation structure within 
the i-th block x*. The prior of x is p(x) ~ A/"(0, So), where 
E is a block-diagonal matrix with each principal block given 
by 7iB ?; . Assume the noise vector satisfies p(v) ~ A/"(0, AI), 
where A is a positive scalar. Therefore, the posterior of x 
is given by p(x|y; A, {7^, B 4 }f =1 ) = Af(/i x ,£ x ) with fi x = 
S * T (AI + *S * T ) _1 y and S, = (S 1 + i* 7 *)- 1 . 
Once the hyperparameters A, {7i,B i }^ =1 are estimated, the 
Maximum-A-Posterior (MAP) estimate of x, denoted by x, 
can be directly obtained from the mean of the posterior, i.e. 

Sc^fi x -So* T (AI + *S * T ) _1 y- (4) 

The hyperparameters are generally estimated by a Type II 
maximum likelihood procedure [18]. This is equivalent to 
minimizing the following negative log-likelihood [11] with 
respect to each hyperparameter 

£(\,H,B t y i=1 ) 4 bglEj + y^-V, (5) 

where S y = AI + <&Eo<I? T . A number of optimization 
approaches are available for estimating the hyperparameters 
[11]. Here we only present the results using the Expectation- 
Maximization (EM) method: 

7l <- ^Tr[BrH^ + /4(/4) T )L (6) 

% lly-^JIi + E,Tr(s^(^f^) 

M K ' 

B t <- Tocplitz([l,r,--- ,r di ~ 1 ]), Mi (8) 

where fi x G R diXl is the corresponding i-th block in /j, x , 
and S x G R d i x<J i i s the corresponding i-th principal diagonal 
block in T, x . In ©, r = sign(§^) min{|^|, 0.99}, where 

too = S?=i m o ^1 = X)f=i m i- Here to and to^ are 
the averages of the entries along the main diagonal and the 
main sub-diagonal of B;, which is learned by the rule: B^ •<— 
J- [S^, + /n x (/i x ) T ]. The resulting algorithm, denoted by 
BSBL-EM, then iterates over © © © © until convergence. 

Extensive experiments have shown that the algorithms 
derived from the bSBL framework have the best recovery 
performance among existing algorithms [11] and shed light 
on various aspects of the intra-block correlation structure, 
including benefits of exploiting the correlation, guidance on 
how to modify existing algorithms to exploit the correlation 



[19], modification to deal with block sparsity with unknown 
block partition [11], and applications to problems with less 
sparsity [12]. 

B. Inter-Vector Correlation 

This is the situation in the MMV model © where there 
is correlation among the entries in each non-zero row of X. 
To deal with this situation, we assume the rows X.;. (Vi) are 
mutually independent, and the density distribution of each X^. 
is parameterized multivariate Gaussian, given by 

p(X l .; 7l ,B l ) -A^(0, 7l B,), i = l,---,M 

where 7$ is a nonnegative hyperparameter controlling the row 
sparsity of X. When 7, = 0, the associated X^. becomes zero. 
Bi is a positive definite matrix that captures the correlation 
structure of X^.. Note that by letting y = vcc(Y T ), D = 
€> (g) II, x = vcc(X T ), and v = vec(V T ), we can transform 
the MMV model to the following SMV model [10], [13] 

y = Dx + v, 

where x has the block partition © with di = L(Vi). There- 
fore, all the algorithms derived from the bSBL framework 
[11] can be applied to the MMV model. For more details, the 
reader is referred to [13], [20]. For convenience, the resulting 
algorithms are together called the T-SBL family. Interestingly, 
the role of the correlation structure on the performance of 
existing MMV algorithms is found to be quite different from 
that of intra-vector correlation [11]. Some explanation to this 
observation is provided in Section |IV] As in the inter-vector 
case, algorithms in the T-SBL family provide insight into how 
to modify existing MMV algorithms that operate in the X- 
space to incorporate inter- vector correlation [19], [20]. 

In some applications the matrix X has both the intra-vector 
correlation and the inter- vector correlation. This correlation 
structure can be exploited as well by extending the bSBL 
framework. Assume X can be partitioned into a number of 
blocks, and the z-th block consists of di rows. Then a key 
issue is how to model the correlation structure in each block. 
The most general model would involve stacking the rows of a 
block and using a diL x diL matrix to model the correlation in 
this block. But estimating such a model from a small number 
of measurement vectors can lead to overfitting and unreliable 
estimates. Thus, simplified models are needed, and in this 
context the Kronecker model has support from applications. 
The overall correlation structure in the t-th block is modeled as 
R 1 = RJ <E) R!, , where RJ captures the inter- vector correlation 
in this block and R* captures the intra-vector correlation. 
Understanding the role of the correlation and how accurately 
to model and incorporate correlation is an interesting topic for 
future study. 

C. Time-Varying Sparsity Model 

The time-varying sparsity model is a natural extension of 
the MMV model. It considers the case when the support 
of each column of X is time-varying. The transition from 
the stationary models, assumed so far, to the non-stationary 



situation opens up an abundance of options akin to past work 
on tracking which has led to adaptive filters, Kalman Filters 
and so on. 

The measurement model in this case is given by 



y, = *x t + v t , t = 0,1,2, 



(9) 



Here, y t £ M Arxl is a measurement vector, x t £ M. Mxl is 
the sparse signal with time-varying sparsity, and v t is a noise 
vector. 

A model for generating signals x t with time-varying spar- 
sity is needed both for developing optimal algorithms and 
for systematic evaluation of algorithms developed. Drawing 
inspiration from applications like neuroelectromagnetic source 
localization, the measurement data can be viewed as being 
generated by a sequence of events leading to an approximate 
piecewise stationary model. Each stationary segment leads 
to an MMV model, which involves a sparsity pattern and 
a multivariate time series for the nonzero entries that lasts 
a certain duration. The time series maybe modeled as a 
multivariate random signal with certain statistical properties 
or a deterministic model. For the statistical case, one can 
use a multivariate AR process to model the signal. For the 
deterministic case, one can assume it is the response of a 
dynamical system to an impulse input, e.g. a set of second 
order difference equations. 

The transition from event to event may be completely 
random or structured. Completely random means the sparsity 
pattern changes in an independent manner and the number 
of non-zero entries at a given time always lies in a given 
range. Structured means that the sparsity change is more 
gradual, i.e. few entries get turned off and a new set of small 
entries are turned on potentially in an asynchronous manner. 
A model with such reasonable flexibility will be very useful 
for generation of data and testing of algorithms. 

To deal with time-varying sparsity, several algorithms have 
been proposed, such as SOB-M-FOCUSS [17], message pass- 
ing algorithms [16], and Least-Square Compressed Sensing 
(LS-CS) [15]. Since the support of x t is changing slowly, we 
can view such a time-varying sparsity model as a concate- 
nation of several MMV models [19], where in each MMV 
model the support does not change. Therefore, algorithms 
in the T-SBL family can be used in this model. Note that 
here exploiting the multiple measurement vectors is important 
because of the enhanced support-recovery ability afforded by 
the MMV model as discussed in Section QV] And we will 
illustrate this benefit in Section [V] 

IV. Limits of Support Recovery 

An interesting question is the limits of sparse signal recov- 
ery algorithms, i.e., under what conditions is any algorithm 
capable of recovering the locations of the non-zero entries. 
Such results can potentially be also useful in understanding the 
role of the correlation structure in the support recovery task. 
Previous literature discussing the performance limits of sparse 
signal recovery can be divided into two categories. The first 
category of analysis focuses on the performance of practical 



algorithms [3], [21]-[26]. The second category of performance 
analysis focuses on the performance limits of the theoretical 
algorithms with combinatorial complexity [27]-[30]. In this 
paper, we consider the information theoretic performance limit 
of support recovery that governs any algorithm, which belongs 
to the second category as described above. 

Let W denote a matrix with all elements being non-zero. 
Define the generative model for the sparse signal X as 

if s = Sj, 
if s^{S 1 ,...,S K }. 

The support of X, denoted by supp(X), is the set of 
indices corresponding to the non-zero rows of X, i.e., 
supp(X) = {Si, Sk}- According to the signal model ( fTOt . 
supp(X) = K. We assume K is known. 

Upon observing the noisy measurement Y, the goal is to 
recover the indices of the non-zero rows of X. A support 
recovery map is defined as 

d:R NxL ^2^. 







(10) 



(11) 



We further define the average probability of error by 



PMY)^su PP (x(iy,s))} 

for each (unknown) signal value matrix W £ M. KxL . Note 
that the probability is averaged over the randomness of the 
locations of the non-zero rows S, the measurement matrix 4>, 
and the measurement noise V. 

We consider the support recovery of a sequence of sparse 
signals generated with the same signal value matrix W. In 
particular, we assume that K and L are fixed. Define the 
auxiliary quantity 



c(W) = min 
TC[K] 



1 



2\T\ 



logdet / 



(12) 



where W_ T denotes a matrix formed by appropriately choosing 
a set of rows indexed by T from W. The following two 
theorems summarize the performance limits in support recov- 
ery of sparse signals. The notation Nm implies the possible 
dependency between N and M. 
Theorem 1: If 

log M 



lim sup 

m->oo Nm 



< c{W) 



(13) 



then there exists a sequence of support recovery maps 

{d (M) }Ti=Ki d(M) ■ R NmXL 2M, such that 



lim P{d(Y)^supp(X(^S))} 

M — >oo 

Theorem 2: If 

log M , , 
lim sup — — — > c(W) 
m->oo Nm 



0. 



(14) 



(15) 



then for any sequence of support recovery maps 



{rf (M) }M=i^ (M) 



pN M xL 



2 [M] ) 



liminf P{d(Y) ^ supp(X(W, S))} > 0. 

M — > OO 



(16) 



Theorems 1 and 2 together indicate that TV = c ^^ e log M 
is the sufficient and necessary number of measurements per 



measurement vector to ensure asymptotically successful sup- 
port recovery. The constant c(W) explicitly captures the role 
of the non-zero entries in the performance tradeoff. 

To understand the result and its implication, we need to 
examine the structure of the non-zero matrix W . Assume 
L < K, then the quantity c(W), with mild assumptions on 
the non-zero entries, grows linearly with L [31]. This fact 
indicates that support recovery in the MMV problem greatly 
benefits from the presence of new measurements. Meanwhile, 
Theorems 1 and 2 characterize the role of each non-zero entry 
in the matrix X in the performance limit of support recovery 
of sparse signals. Indeed, adding different measurement vec- 
tors may cause drastically different performance gains. As a 
special case, when the columns of the non-zero signal matrix 
W are identical, the performance gain of having MMV is 
equivalent to merely reducing the noise level by a factor of L. 
However, by properly constructing a matrix W with certain 
rank conditions imposed on its submatrices, the performance 
limit of support recovery can enjoy a much larger gain as a 
result of, in the language of MIMO wireless communication, 
a multiplexing gain. For the SMV block sparsity model where 
L = 1, no such benefit accrues. However, the norm of the 
blocks contributes to a signal-to-noise ratio gain. It is useful 
to note the analysis so far is conducted with a fixed W. For 
random non-zero entries, one can use the results in the two 
theorems above as the instantaneous capacity and conduct an 
outage analysis [30], [31]. In the context of random entries, the 
blocks, under mild assumptions, provide a diversity gain that 
greatly improves the performance of block sparsity algorithms 
with known block size [32]. 

V. Experiments 

Three representative experiments were performed. Each 
experiment is based on 500 trials. In each trial the matrix 
<1> G M. NxM was generated to be a Gaussian random matrix 
with columns to be unit norm. We chose the MSE as a 
performance index in noisy experiments, and the Success Rate 
as a performance index in noiseless experiments. The success 
rate was defined as the ratio of the number of successful 
trials to the number of total trials, while a successful trial 
was defined as the one when MSE < 10~ 6 . 

Experiment 1: Effect of Intra-Block Correlation in a 
SMV Model. In this noiseless experiment we studied the 
effect of intra-block correlation with the use of the BSBL-EM 
algorithm presented in Section IIII-AI The matrix <& was of the 
size 100 x 300. The sparse signal x consisted of 75 blocks with 
identical size. Only 20 of the blocks were non-zero. Entries 
in every non-zero block were modeled as an AR(1) process 
with the same AR coefficient j3. f3 assumed values ranging 
from -0.99 to 0.99. The experiment was then repeated for each 
value of j3. BSBL-EM was performed in two ways, namely 
adaptively learning the intra-block correlation and completely 
ignoring the correlation (i.e. set Bj = I(Vi)). 

The result (FigQ] (a)) clearly shows that when correlation 
is exploited, BSBL-EM has improved performance with the 
increase in the correlation. However, when the correlation is 
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Fig. 1. (a) Effects of intra-block correlation on algorithm performance, (b) 
Effects of inter-vector correlation on algorithm performance. 



not exploited, the performance is unchanged with correlation. 
Note that the latter phenomenon was also observed from 
existing algorithms which do not exploit the correlation [11]. 

Experiment 2: Effect of Inter- Vector Correlation in an 
MMV model. Next we studied the effect of inter-vector corre- 
lation in a noiseless MMV experiment, where N = 25, M = 
125, L = 4 and the number of nonzero rows of X was 18. The 
inter-vector correlation values were chosen from the range - 
0.99 to 0.99, and the experiment was repeated for each of the 
values. The T-MSBL algorithm [13], a member of the T-SBL 
family introduced in Section ITlI-BI was carried out to show the 
benefit from exploiting the correlation. For comparison, two 
typical MMV algorithms which do not exploit the correlation, 
namely M-SBL [33] and Group-Lasso [9] (the variant for the 
MMV model), were also performed. Note that if T-MSBL is 
forced not to exploit the inter-vector correlation (i.e., setting 
B; = I(Vi)), it reduces to the M-SBL algorithm. 

The result (FigQ] (b)) shows that when the inter-vector 
correlation increases, T-MSBL has improved performance, but 
the two compared algorithms have degradation in performance. 

Experiment 3: Time-Varying Sparsity Model. We con- 
ducted a noisy experiment to verify our strategy to treat a 
time-varying sparsity model as stated in Section IIII-CI # was 
of the size 60 x 256. The column number of X was 50. The 
number of nonzero rows, K, during the first 15 columns of 
X was 15. K was increased by 10 starting from the 16-th to 
the 31-th column of X. Also, starting from the 26-th column, 
5 existing nonzero rows were set to zeros. Each nonzero row 
was modeled as an AR(1) process with the AR coefficient 
varying from 0.7 to 0.99, and had a duration of at most 20 
columns. SNR was 20 dB. 

T-MSBL, M-SBL, SOB-M-FOCUSS, and LS-CS were 
compared. SOB-M-FOCUSS treats a time-varying sparsity 
model as a series of overlapped MMV models and exploits 
smoothness in amplitudes of non-zero entries of x t over 
a short interval. For this algorithm, we set the length of 
each MMV model to 5, and set the overlapping rate to 0.5. 
Its smoothing matrix was a second-order smoothing matrix 
given in [13]. LS-CS is an algorithm which does not exploit 
the benefit of multiple measurement vectors and the inter- 
vector correlation. SOB-M-FOCUSS and LS-CS were given 
the true noise variance, while both T-MSBL and M-SBL 




Column Index 

Fig. 2. Performance comparison in the experiment with time- varying sparsity. 

learned the noise variance. When performing T-MSBL and 
M-SBL, we approximated the time-varying sparsity model 
in two ways. One was using the concatenation of 25 MMV 
models with each MMV model containing 2 columns. The 
second was using 10 MMV models with each containing 5 
columns. Figure [2] shows the advantages of exploiting multiple 
measurement vectors (by comparing T-MSBL/M-SBL/SOB- 
M-FOCUSS to LS-CS) and of exploiting the inter-vector 
correlation (by comparing T-MSBL to M-SBL) by adaptively 
learning the correlation (by comparing T-MSBL to SOB-M- 
FOCUSS). 

VI. Conclusion 

This paper discussed the problem of sparse signal recovery 
when there is correlation in the values of the non-zero entries. 
We reviewed both intra-vector correlation in the context of the 
block sparse model and intra-vector correlation in the context 
of the multiple measurement vector model. We discussed 
how the sparse Bayesian learning framework can effectively 
incorporate correlation at the algorithm level. The impact of 
correlation on the limits of support recovery is also discussed. 
Since applications involving sparsity are likely to be endowed 
with additional structure, incorporating structure motivated by 
applications and exploiting them to develop algorithms as well 
as to improve recovery performance holds much promise. 
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